An Approach to Document Fingerprinting
نویسندگان
چکیده
The nature of an individual document is often defined by its relationship to selected tasks, societal values, and cultural meaning. The identifying features, regardless of whether the document content is textual, aural or visual, are often delineated in terms of descriptions about the document, for example, intended audience, coverage of topics, purpose of creation, structure of presentation as well as relationships to other entities expressed by authorship, ownership, production process, and geographical and temporal markers. To secure a comprehensive view of a document, therefore, we must draw heavily on cognitive and/or computational resources not only to extract and classify information at multiple scales, but also to interlink these across multiple dimensions in parallel. Here we present a preliminary thought experiment for fingerprinting documents using textual documents visualised and analysed at multiple scales and dimensions to explore patterns on which we might capitalise.
منابع مشابه
Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting
With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucit...
متن کاملA survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملمنشأیابی منابع رسوب: ارتباط بین فعالیتهای آنزیمی خاک و رسوب
Sediment sources fingerprinting is needed as an autonomous tool for erosion prediction, validation of soil erosion models, monitoring of sediment budget and consequently for selecting soil conservation practices and sediment control methods at the catchment scale. Apportioning of eroded-soil into multiple sources using natural tracers is an integrated approach in soil erosion and sediment studi...
متن کاملContent-based data leakage detection using extended fingerprinting
Protecting sensitive information from unauthorized disclosure is a major concern of every organization. As an organization’s employees need to access such information in order to carry out their daily work, data leakage detection is both an essential and challenging task. Whether caused by malicious intent or an inadvertent mistake, data loss can result in significant damage to the organization...
متن کاملLearning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملA Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure
Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015